파이썬 (0506) 10주차

판다스
Author

김보람

Published

May 6, 2022

!pip install numpy
!pip install pandas
Requirement already satisfied: numpy in /home/koinup4/anaconda3/envs/py37/lib/python3.7/site-packages (1.21.6)
Collecting pandas
  Downloading pandas-1.3.5-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (11.3 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 11.3/11.3 MB 89.6 MB/s eta 0:00:00a 0:00:01
Requirement already satisfied: pytz>=2017.3 in /home/koinup4/anaconda3/envs/py37/lib/python3.7/site-packages (from pandas) (2022.7.1)
Requirement already satisfied: numpy>=1.17.3 in /home/koinup4/anaconda3/envs/py37/lib/python3.7/site-packages (from pandas) (1.21.6)
Requirement already satisfied: python-dateutil>=2.7.3 in /home/koinup4/anaconda3/envs/py37/lib/python3.7/site-packages (from pandas) (2.8.2)
Requirement already satisfied: six>=1.5 in /home/koinup4/anaconda3/envs/py37/lib/python3.7/site-packages (from python-dateutil>=2.7.3->pandas) (1.16.0)
Installing collected packages: pandas
Successfully installed pandas-1.3.5
import numpy as np
import pandas as pd

부분 데이터 꺼내기: 판다스를 왜 써야할까?

기본 인덱싱

-예제1: 기본인덱싱

a='asdf'
a[2]
'd'
a[-1]
'f'

- 예제2: 슬라이싱

a='asdf'
a[1:3]
'sd'
a[-2:]
'df'

- 예제3: 스트라이딩

a='afsdf'
a[::2]
'asf'

- 예제4: 불가능한것

a='afsd'
a[[1,2]] # 리스트로 전달해서 뽑는것은 불가능 -> 정수인덱스 리스트화시켜서 인덱싱하는것
TypeError: string indices must be integers
a='afsd'
a[[True,True,False,True]] # 리스트로 전달해서 뽑는것은 불가능 -> 정수인덱스 리스트화시켜서 인덱싱하는것
TypeError: string indices must be integers

팬시인덱싱

- 예제1: 인덱스의 리스트(혹은 ndarray)를 전달

a=np.arange(5)
a[0]
0
a[[0,1]]
array([0, 1])
a[[0,1,-2]]
array([0, 1, 3])

- 예제2: bool로 이루어진 리스트 (혹은 ndarray)를 전달

a=np.arange(55,61)
a
array([55, 56, 57, 58, 59, 60])
a[[True,True,False,True,True,False]]
array([55, 56, 58, 59])
a<58
array([ True,  True,  True, False, False, False])
a[a<58]
array([55, 56, 57])

2차원 자료형의 인덱싱

- 예제1

a=np.arange(4*3).reshape(4,3)
a
array([[ 0,  1,  2],
       [ 3,  4,  5],
       [ 6,  7,  8],
       [ 9, 10, 11]])
a[0:2,1]
array([1, 4])

- 예제2 : 차원을 유지하면서 인덱싱을 하고 싶으면?

a[0:2,[1]]
array([[1],
       [4]])

HASH

- 예제1 : (key, value)

d={'att':67, 'rep':45, 'mid':30, 'fin':100}
d
{'att': 67, 'rep': 45, 'mid': 30, 'fin': 100}
d['att'] # key를 넣으면 value가 리턴
67

- 예제2: numpy비교

np.random.seed(43052)
att = np.random.choice(np.arange(10,21)*5,200)
rep = np.random.choice(np.arange(5,21)*5,200)
mid = np.random.choice(np.arange(0,21)*5,200)
fin = np.random.choice(np.arange(0,21)*5,200)
key = ['202212'+str(s) for s in np.random.choice(np.arange(300,501),200,replace=False)]
test_dic = {key[i] : {'att':att[i], 'rep':rep[i], 'mid':mid[i], 'fin':fin[i]} for i in range(200)}
test_ndarray = np.array([key,att,rep,mid,fin],dtype=np.int64).T
test_dic
{'202212377': {'att': 65, 'rep': 45, 'mid': 0, 'fin': 10},
 '202212473': {'att': 95, 'rep': 30, 'mid': 60, 'fin': 10},
 '202212310': {'att': 65, 'rep': 85, 'mid': 15, 'fin': 20},
 '202212460': {'att': 55, 'rep': 35, 'mid': 35, 'fin': 5},
 '202212320': {'att': 80, 'rep': 60, 'mid': 55, 'fin': 70},
 '202212329': {'att': 75, 'rep': 40, 'mid': 75, 'fin': 85},
 '202212408': {'att': 65, 'rep': 70, 'mid': 60, 'fin': 75},
 '202212319': {'att': 60, 'rep': 25, 'mid': 20, 'fin': 35},
 '202212348': {'att': 95, 'rep': 55, 'mid': 65, 'fin': 90},
 '202212306': {'att': 90, 'rep': 25, 'mid': 95, 'fin': 50},
 '202212308': {'att': 55, 'rep': 45, 'mid': 75, 'fin': 30},
 '202212366': {'att': 95, 'rep': 60, 'mid': 25, 'fin': 55},
 '202212367': {'att': 95, 'rep': 35, 'mid': 0, 'fin': 25},
 '202212461': {'att': 50, 'rep': 55, 'mid': 90, 'fin': 45},
 '202212354': {'att': 50, 'rep': 65, 'mid': 50, 'fin': 70},
 '202212361': {'att': 95, 'rep': 100, 'mid': 25, 'fin': 40},
 '202212400': {'att': 50, 'rep': 65, 'mid': 35, 'fin': 85},
 '202212490': {'att': 65, 'rep': 85, 'mid': 10, 'fin': 5},
 '202212404': {'att': 70, 'rep': 65, 'mid': 65, 'fin': 80},
 '202212326': {'att': 90, 'rep': 70, 'mid': 100, 'fin': 30},
 '202212452': {'att': 80, 'rep': 45, 'mid': 80, 'fin': 85},
 '202212362': {'att': 55, 'rep': 45, 'mid': 85, 'fin': 70},
 '202212396': {'att': 65, 'rep': 35, 'mid': 45, 'fin': 20},
 '202212356': {'att': 70, 'rep': 25, 'mid': 50, 'fin': 70},
 '202212305': {'att': 85, 'rep': 55, 'mid': 30, 'fin': 80},
 '202212398': {'att': 90, 'rep': 30, 'mid': 30, 'fin': 0},
 '202212410': {'att': 100, 'rep': 65, 'mid': 50, 'fin': 70},
 '202212385': {'att': 80, 'rep': 70, 'mid': 50, 'fin': 100},
 '202212430': {'att': 80, 'rep': 35, 'mid': 25, 'fin': 65},
 '202212498': {'att': 55, 'rep': 75, 'mid': 20, 'fin': 25},
 '202212423': {'att': 75, 'rep': 75, 'mid': 85, 'fin': 95},
 '202212327': {'att': 80, 'rep': 95, 'mid': 5, 'fin': 5},
 '202212347': {'att': 95, 'rep': 60, 'mid': 65, 'fin': 10},
 '202212483': {'att': 95, 'rep': 60, 'mid': 90, 'fin': 75},
 '202212447': {'att': 100, 'rep': 75, 'mid': 70, 'fin': 25},
 '202212496': {'att': 100, 'rep': 55, 'mid': 35, 'fin': 85},
 '202212358': {'att': 80, 'rep': 60, 'mid': 65, 'fin': 55},
 '202212399': {'att': 70, 'rep': 80, 'mid': 0, 'fin': 10},
 '202212459': {'att': 85, 'rep': 65, 'mid': 60, 'fin': 60},
 '202212313': {'att': 100, 'rep': 95, 'mid': 0, 'fin': 25},
 '202212304': {'att': 95, 'rep': 60, 'mid': 15, 'fin': 45},
 '202212431': {'att': 75, 'rep': 40, 'mid': 30, 'fin': 10},
 '202212325': {'att': 70, 'rep': 80, 'mid': 50, 'fin': 25},
 '202212471': {'att': 50, 'rep': 45, 'mid': 10, 'fin': 10},
 '202212463': {'att': 100, 'rep': 100, 'mid': 100, 'fin': 50},
 '202212441': {'att': 75, 'rep': 50, 'mid': 60, 'fin': 5},
 '202212445': {'att': 85, 'rep': 50, 'mid': 35, 'fin': 100},
 '202212323': {'att': 80, 'rep': 35, 'mid': 75, 'fin': 80},
 '202212442': {'att': 95, 'rep': 45, 'mid': 35, 'fin': 80},
 '202212346': {'att': 65, 'rep': 85, 'mid': 85, 'fin': 15},
 '202212411': {'att': 90, 'rep': 30, 'mid': 25, 'fin': 5},
 '202212468': {'att': 65, 'rep': 65, 'mid': 35, 'fin': 70},
 '202212331': {'att': 80, 'rep': 65, 'mid': 30, 'fin': 90},
 '202212345': {'att': 95, 'rep': 80, 'mid': 45, 'fin': 35},
 '202212339': {'att': 65, 'rep': 75, 'mid': 50, 'fin': 35},
 '202212383': {'att': 90, 'rep': 55, 'mid': 100, 'fin': 30},
 '202212462': {'att': 95, 'rep': 25, 'mid': 95, 'fin': 90},
 '202212344': {'att': 100, 'rep': 50, 'mid': 80, 'fin': 10},
 '202212472': {'att': 50, 'rep': 55, 'mid': 35, 'fin': 60},
 '202212437': {'att': 90, 'rep': 70, 'mid': 35, 'fin': 25},
 '202212336': {'att': 50, 'rep': 55, 'mid': 15, 'fin': 75},
 '202212438': {'att': 80, 'rep': 50, 'mid': 55, 'fin': 90},
 '202212454': {'att': 50, 'rep': 75, 'mid': 65, 'fin': 90},
 '202212384': {'att': 70, 'rep': 40, 'mid': 90, 'fin': 5},
 '202212402': {'att': 65, 'rep': 85, 'mid': 20, 'fin': 90},
 '202212397': {'att': 60, 'rep': 30, 'mid': 0, 'fin': 50},
 '202212318': {'att': 50, 'rep': 65, 'mid': 15, 'fin': 0},
 '202212371': {'att': 60, 'rep': 95, 'mid': 30, 'fin': 70},
 '202212469': {'att': 70, 'rep': 70, 'mid': 5, 'fin': 0},
 '202212379': {'att': 75, 'rep': 45, 'mid': 15, 'fin': 75},
 '202212364': {'att': 50, 'rep': 60, 'mid': 15, 'fin': 50},
 '202212450': {'att': 85, 'rep': 90, 'mid': 90, 'fin': 90},
 '202212337': {'att': 80, 'rep': 25, 'mid': 85, 'fin': 20},
 '202212458': {'att': 55, 'rep': 75, 'mid': 95, 'fin': 90},
 '202212494': {'att': 85, 'rep': 30, 'mid': 45, 'fin': 15},
 '202212478': {'att': 65, 'rep': 30, 'mid': 45, 'fin': 15},
 '202212373': {'att': 85, 'rep': 95, 'mid': 35, 'fin': 25},
 '202212474': {'att': 60, 'rep': 25, 'mid': 10, 'fin': 50},
 '202212455': {'att': 95, 'rep': 45, 'mid': 90, 'fin': 35},
 '202212317': {'att': 85, 'rep': 50, 'mid': 60, 'fin': 45},
 '202212341': {'att': 60, 'rep': 50, 'mid': 100, 'fin': 70},
 '202212386': {'att': 100, 'rep': 75, 'mid': 60, 'fin': 0},
 '202212328': {'att': 100, 'rep': 90, 'mid': 85, 'fin': 75},
 '202212417': {'att': 55, 'rep': 100, 'mid': 100, 'fin': 60},
 '202212370': {'att': 70, 'rep': 60, 'mid': 30, 'fin': 40},
 '202212486': {'att': 70, 'rep': 90, 'mid': 95, 'fin': 40},
 '202212333': {'att': 55, 'rep': 50, 'mid': 0, 'fin': 5},
 '202212360': {'att': 100, 'rep': 100, 'mid': 45, 'fin': 90},
 '202212350': {'att': 85, 'rep': 70, 'mid': 90, 'fin': 80},
 '202212382': {'att': 100, 'rep': 85, 'mid': 65, 'fin': 85},
 '202212392': {'att': 60, 'rep': 65, 'mid': 35, 'fin': 15},
 '202212449': {'att': 65, 'rep': 75, 'mid': 75, 'fin': 85},
 '202212394': {'att': 65, 'rep': 25, 'mid': 40, 'fin': 0},
 '202212444': {'att': 75, 'rep': 75, 'mid': 50, 'fin': 40},
 '202212487': {'att': 50, 'rep': 55, 'mid': 80, 'fin': 55},
 '202212425': {'att': 75, 'rep': 30, 'mid': 20, 'fin': 50},
 '202212312': {'att': 100, 'rep': 50, 'mid': 25, 'fin': 65},
 '202212448': {'att': 90, 'rep': 30, 'mid': 95, 'fin': 35},
 '202212434': {'att': 55, 'rep': 100, 'mid': 80, 'fin': 0},
 '202212451': {'att': 75, 'rep': 60, 'mid': 15, 'fin': 40},
 '202212433': {'att': 60, 'rep': 25, 'mid': 25, 'fin': 50},
 '202212424': {'att': 85, 'rep': 35, 'mid': 10, 'fin': 60},
 '202212351': {'att': 60, 'rep': 100, 'mid': 55, 'fin': 40},
 '202212324': {'att': 70, 'rep': 55, 'mid': 50, 'fin': 75},
 '202212314': {'att': 80, 'rep': 65, 'mid': 95, 'fin': 85},
 '202212446': {'att': 65, 'rep': 35, 'mid': 15, 'fin': 65},
 '202212401': {'att': 85, 'rep': 70, 'mid': 100, 'fin': 0},
 '202212307': {'att': 100, 'rep': 30, 'mid': 60, 'fin': 65},
 '202212300': {'att': 65, 'rep': 70, 'mid': 55, 'fin': 70},
 '202212342': {'att': 85, 'rep': 55, 'mid': 85, 'fin': 90},
 '202212479': {'att': 85, 'rep': 95, 'mid': 80, 'fin': 10},
 '202212443': {'att': 85, 'rep': 70, 'mid': 75, 'fin': 5},
 '202212387': {'att': 100, 'rep': 35, 'mid': 70, 'fin': 0},
 '202212372': {'att': 95, 'rep': 45, 'mid': 55, 'fin': 65},
 '202212376': {'att': 95, 'rep': 85, 'mid': 40, 'fin': 65},
 '202212466': {'att': 55, 'rep': 50, 'mid': 30, 'fin': 85},
 '202212391': {'att': 85, 'rep': 50, 'mid': 5, 'fin': 65},
 '202212368': {'att': 75, 'rep': 90, 'mid': 85, 'fin': 85},
 '202212427': {'att': 95, 'rep': 70, 'mid': 10, 'fin': 5},
 '202212414': {'att': 85, 'rep': 35, 'mid': 80, 'fin': 95},
 '202212426': {'att': 95, 'rep': 50, 'mid': 80, 'fin': 90},
 '202212316': {'att': 100, 'rep': 65, 'mid': 75, 'fin': 40},
 '202212355': {'att': 95, 'rep': 70, 'mid': 70, 'fin': 0},
 '202212477': {'att': 95, 'rep': 70, 'mid': 20, 'fin': 25},
 '202212484': {'att': 100, 'rep': 60, 'mid': 10, 'fin': 5},
 '202212456': {'att': 55, 'rep': 35, 'mid': 25, 'fin': 10},
 '202212500': {'att': 60, 'rep': 90, 'mid': 40, 'fin': 5},
 '202212381': {'att': 85, 'rep': 90, 'mid': 85, 'fin': 75},
 '202212335': {'att': 75, 'rep': 85, 'mid': 25, 'fin': 35},
 '202212475': {'att': 55, 'rep': 30, 'mid': 50, 'fin': 45},
 '202212343': {'att': 70, 'rep': 60, 'mid': 75, 'fin': 75},
 '202212412': {'att': 80, 'rep': 30, 'mid': 95, 'fin': 5},
 '202212428': {'att': 90, 'rep': 85, 'mid': 80, 'fin': 15},
 '202212330': {'att': 90, 'rep': 25, 'mid': 95, 'fin': 5},
 '202212375': {'att': 60, 'rep': 85, 'mid': 50, 'fin': 20},
 '202212413': {'att': 90, 'rep': 50, 'mid': 95, 'fin': 95},
 '202212303': {'att': 75, 'rep': 95, 'mid': 65, 'fin': 40},
 '202212374': {'att': 60, 'rep': 40, 'mid': 35, 'fin': 0},
 '202212409': {'att': 55, 'rep': 100, 'mid': 15, 'fin': 80},
 '202212440': {'att': 70, 'rep': 75, 'mid': 80, 'fin': 0},
 '202212393': {'att': 75, 'rep': 65, 'mid': 25, 'fin': 20},
 '202212492': {'att': 90, 'rep': 75, 'mid': 80, 'fin': 25},
 '202212357': {'att': 50, 'rep': 75, 'mid': 75, 'fin': 20},
 '202212465': {'att': 55, 'rep': 45, 'mid': 35, 'fin': 45},
 '202212415': {'att': 90, 'rep': 70, 'mid': 90, 'fin': 0},
 '202212405': {'att': 75, 'rep': 30, 'mid': 100, 'fin': 60},
 '202212435': {'att': 90, 'rep': 85, 'mid': 0, 'fin': 40},
 '202212380': {'att': 85, 'rep': 70, 'mid': 35, 'fin': 0},
 '202212369': {'att': 100, 'rep': 75, 'mid': 100, 'fin': 85},
 '202212467': {'att': 55, 'rep': 35, 'mid': 20, 'fin': 10},
 '202212429': {'att': 70, 'rep': 75, 'mid': 90, 'fin': 90},
 '202212495': {'att': 90, 'rep': 90, 'mid': 55, 'fin': 55},
 '202212420': {'att': 55, 'rep': 60, 'mid': 40, 'fin': 0},
 '202212302': {'att': 100, 'rep': 90, 'mid': 5, 'fin': 30},
 '202212481': {'att': 50, 'rep': 55, 'mid': 25, 'fin': 80},
 '202212422': {'att': 100, 'rep': 100, 'mid': 90, 'fin': 55},
 '202212388': {'att': 70, 'rep': 45, 'mid': 70, 'fin': 75},
 '202212480': {'att': 85, 'rep': 95, 'mid': 85, 'fin': 90},
 '202212378': {'att': 55, 'rep': 25, 'mid': 95, 'fin': 45},
 '202212457': {'att': 75, 'rep': 30, 'mid': 10, 'fin': 95},
 '202212419': {'att': 65, 'rep': 85, 'mid': 15, 'fin': 60},
 '202212432': {'att': 70, 'rep': 90, 'mid': 70, 'fin': 0},
 '202212395': {'att': 60, 'rep': 85, 'mid': 70, 'fin': 85},
 '202212464': {'att': 100, 'rep': 25, 'mid': 10, 'fin': 20},
 '202212476': {'att': 75, 'rep': 25, 'mid': 80, 'fin': 25},
 '202212332': {'att': 90, 'rep': 95, 'mid': 40, 'fin': 80},
 '202212301': {'att': 95, 'rep': 90, 'mid': 50, 'fin': 50},
 '202212497': {'att': 90, 'rep': 90, 'mid': 65, 'fin': 85},
 '202212309': {'att': 95, 'rep': 75, 'mid': 50, 'fin': 40},
 '202212493': {'att': 55, 'rep': 60, 'mid': 70, 'fin': 5},
 '202212311': {'att': 95, 'rep': 85, 'mid': 0, 'fin': 15},
 '202212416': {'att': 65, 'rep': 60, 'mid': 35, 'fin': 20},
 '202212489': {'att': 65, 'rep': 50, 'mid': 5, 'fin': 5},
 '202212359': {'att': 90, 'rep': 25, 'mid': 60, 'fin': 25},
 '202212349': {'att': 100, 'rep': 40, 'mid': 40, 'fin': 15},
 '202212403': {'att': 70, 'rep': 25, 'mid': 100, 'fin': 75},
 '202212418': {'att': 100, 'rep': 30, 'mid': 70, 'fin': 70},
 '202212406': {'att': 50, 'rep': 55, 'mid': 55, 'fin': 5},
 '202212485': {'att': 70, 'rep': 35, 'mid': 70, 'fin': 100},
 '202212390': {'att': 70, 'rep': 60, 'mid': 60, 'fin': 80},
 '202212365': {'att': 55, 'rep': 45, 'mid': 90, 'fin': 5},
 '202212338': {'att': 55, 'rep': 55, 'mid': 10, 'fin': 95},
 '202212363': {'att': 65, 'rep': 80, 'mid': 10, 'fin': 30},
 '202212321': {'att': 90, 'rep': 25, 'mid': 35, 'fin': 55},
 '202212499': {'att': 100, 'rep': 30, 'mid': 30, 'fin': 85},
 '202212340': {'att': 70, 'rep': 85, 'mid': 70, 'fin': 65},
 '202212421': {'att': 60, 'rep': 100, 'mid': 45, 'fin': 100},
 '202212407': {'att': 70, 'rep': 25, 'mid': 100, 'fin': 15},
 '202212439': {'att': 70, 'rep': 35, 'mid': 80, 'fin': 25},
 '202212488': {'att': 65, 'rep': 60, 'mid': 30, 'fin': 35},
 '202212453': {'att': 95, 'rep': 35, 'mid': 40, 'fin': 95},
 '202212482': {'att': 50, 'rep': 80, 'mid': 65, 'fin': 90},
 '202212334': {'att': 100, 'rep': 40, 'mid': 80, 'fin': 80},
 '202212322': {'att': 55, 'rep': 30, 'mid': 95, 'fin': 100},
 '202212353': {'att': 65, 'rep': 40, 'mid': 65, 'fin': 70},
 '202212491': {'att': 55, 'rep': 70, 'mid': 40, 'fin': 95},
 '202212352': {'att': 65, 'rep': 85, 'mid': 25, 'fin': 85},
 '202212315': {'att': 85, 'rep': 85, 'mid': 100, 'fin': 10},
 '202212470': {'att': 80, 'rep': 65, 'mid': 35, 'fin': 60},
 '202212436': {'att': 50, 'rep': 95, 'mid': 45, 'fin': 85}}

학번 ’202212460’에 해당하는 학생의 출석점수를 알고 싶다면?

(풀이1)

test_dic['202212460']['att']
55

(풀이2)

test_ndarray
array([['202212377', '202212473', '202212310', '202212460', '202212320',
        '202212329', '202212408', '202212319', '202212348', '202212306',
        '202212308', '202212366', '202212367', '202212461', '202212354',
        '202212361', '202212400', '202212490', '202212404', '202212326',
        '202212452', '202212362', '202212396', '202212356', '202212305',
        '202212398', '202212410', '202212385', '202212430', '202212498',
        '202212423', '202212327', '202212347', '202212483', '202212447',
        '202212496', '202212358', '202212399', '202212459', '202212313',
        '202212304', '202212431', '202212325', '202212471', '202212463',
        '202212441', '202212445', '202212323', '202212442', '202212346',
        '202212411', '202212468', '202212331', '202212345', '202212339',
        '202212383', '202212462', '202212344', '202212472', '202212437',
        '202212336', '202212438', '202212454', '202212384', '202212402',
        '202212397', '202212318', '202212371', '202212469', '202212379',
        '202212364', '202212450', '202212337', '202212458', '202212494',
        '202212478', '202212373', '202212474', '202212455', '202212317',
        '202212341', '202212386', '202212328', '202212417', '202212370',
        '202212486', '202212333', '202212360', '202212350', '202212382',
        '202212392', '202212449', '202212394', '202212444', '202212487',
        '202212425', '202212312', '202212448', '202212434', '202212451',
        '202212433', '202212424', '202212351', '202212324', '202212314',
        '202212446', '202212401', '202212307', '202212300', '202212342',
        '202212479', '202212443', '202212387', '202212372', '202212376',
        '202212466', '202212391', '202212368', '202212427', '202212414',
        '202212426', '202212316', '202212355', '202212477', '202212484',
        '202212456', '202212500', '202212381', '202212335', '202212475',
        '202212343', '202212412', '202212428', '202212330', '202212375',
        '202212413', '202212303', '202212374', '202212409', '202212440',
        '202212393', '202212492', '202212357', '202212465', '202212415',
        '202212405', '202212435', '202212380', '202212369', '202212467',
        '202212429', '202212495', '202212420', '202212302', '202212481',
        '202212422', '202212388', '202212480', '202212378', '202212457',
        '202212419', '202212432', '202212395', '202212464', '202212476',
        '202212332', '202212301', '202212497', '202212309', '202212493',
        '202212311', '202212416', '202212489', '202212359', '202212349',
        '202212403', '202212418', '202212406', '202212485', '202212390',
        '202212365', '202212338', '202212363', '202212321', '202212499',
        '202212340', '202212421', '202212407', '202212439', '202212488',
        '202212453', '202212482', '202212334', '202212322', '202212353',
        '202212491', '202212352', '202212315', '202212470', '202212436'],
       ['65', '95', '65', '55', '80', '75', '65', '60', '95', '90', '55',
        '95', '95', '50', '50', '95', '50', '65', '70', '90', '80', '55',
        '65', '70', '85', '90', '100', '80', '80', '55', '75', '80',
        '95', '95', '100', '100', '80', '70', '85', '100', '95', '75',
        '70', '50', '100', '75', '85', '80', '95', '65', '90', '65',
        '80', '95', '65', '90', '95', '100', '50', '90', '50', '80',
        '50', '70', '65', '60', '50', '60', '70', '75', '50', '85', '80',
        '55', '85', '65', '85', '60', '95', '85', '60', '100', '100',
        '55', '70', '70', '55', '100', '85', '100', '60', '65', '65',
        '75', '50', '75', '100', '90', '55', '75', '60', '85', '60',
        '70', '80', '65', '85', '100', '65', '85', '85', '85', '100',
        '95', '95', '55', '85', '75', '95', '85', '95', '100', '95',
        '95', '100', '55', '60', '85', '75', '55', '70', '80', '90',
        '90', '60', '90', '75', '60', '55', '70', '75', '90', '50', '55',
        '90', '75', '90', '85', '100', '55', '70', '90', '55', '100',
        '50', '100', '70', '85', '55', '75', '65', '70', '60', '100',
        '75', '90', '95', '90', '95', '55', '95', '65', '65', '90',
        '100', '70', '100', '50', '70', '70', '55', '55', '65', '90',
        '100', '70', '60', '70', '70', '65', '95', '50', '100', '55',
        '65', '55', '65', '85', '80', '50'],
       ['45', '30', '85', '35', '60', '40', '70', '25', '55', '25', '45',
        '60', '35', '55', '65', '100', '65', '85', '65', '70', '45',
        '45', '35', '25', '55', '30', '65', '70', '35', '75', '75', '95',
        '60', '60', '75', '55', '60', '80', '65', '95', '60', '40', '80',
        '45', '100', '50', '50', '35', '45', '85', '30', '65', '65',
        '80', '75', '55', '25', '50', '55', '70', '55', '50', '75', '40',
        '85', '30', '65', '95', '70', '45', '60', '90', '25', '75', '30',
        '30', '95', '25', '45', '50', '50', '75', '90', '100', '60',
        '90', '50', '100', '70', '85', '65', '75', '25', '75', '55',
        '30', '50', '30', '100', '60', '25', '35', '100', '55', '65',
        '35', '70', '30', '70', '55', '95', '70', '35', '45', '85', '50',
        '50', '90', '70', '35', '50', '65', '70', '70', '60', '35', '90',
        '90', '85', '30', '60', '30', '85', '25', '85', '50', '95', '40',
        '100', '75', '65', '75', '75', '45', '70', '30', '85', '70',
        '75', '35', '75', '90', '60', '90', '55', '100', '45', '95',
        '25', '30', '85', '90', '85', '25', '25', '95', '90', '90', '75',
        '60', '85', '60', '50', '25', '40', '25', '30', '55', '35', '60',
        '45', '55', '80', '25', '30', '85', '100', '25', '35', '60',
        '35', '80', '40', '30', '40', '70', '85', '85', '65', '95'],
       ['0', '60', '15', '35', '55', '75', '60', '20', '65', '95', '75',
        '25', '0', '90', '50', '25', '35', '10', '65', '100', '80', '85',
        '45', '50', '30', '30', '50', '50', '25', '20', '85', '5', '65',
        '90', '70', '35', '65', '0', '60', '0', '15', '30', '50', '10',
        '100', '60', '35', '75', '35', '85', '25', '35', '30', '45',
        '50', '100', '95', '80', '35', '35', '15', '55', '65', '90',
        '20', '0', '15', '30', '5', '15', '15', '90', '85', '95', '45',
        '45', '35', '10', '90', '60', '100', '60', '85', '100', '30',
        '95', '0', '45', '90', '65', '35', '75', '40', '50', '80', '20',
        '25', '95', '80', '15', '25', '10', '55', '50', '95', '15',
        '100', '60', '55', '85', '80', '75', '70', '55', '40', '30', '5',
        '85', '10', '80', '80', '75', '70', '20', '10', '25', '40', '85',
        '25', '50', '75', '95', '80', '95', '50', '95', '65', '35', '15',
        '80', '25', '80', '75', '35', '90', '100', '0', '35', '100',
        '20', '90', '55', '40', '5', '25', '90', '70', '85', '95', '10',
        '15', '70', '70', '10', '80', '40', '50', '65', '50', '70', '0',
        '35', '5', '60', '40', '100', '70', '55', '70', '60', '90', '10',
        '10', '35', '30', '70', '45', '100', '80', '30', '40', '65',
        '80', '95', '65', '40', '25', '100', '35', '45'],
       ['10', '10', '20', '5', '70', '85', '75', '35', '90', '50', '30',
        '55', '25', '45', '70', '40', '85', '5', '80', '30', '85', '70',
        '20', '70', '80', '0', '70', '100', '65', '25', '95', '5', '10',
        '75', '25', '85', '55', '10', '60', '25', '45', '10', '25', '10',
        '50', '5', '100', '80', '80', '15', '5', '70', '90', '35', '35',
        '30', '90', '10', '60', '25', '75', '90', '90', '5', '90', '50',
        '0', '70', '0', '75', '50', '90', '20', '90', '15', '15', '25',
        '50', '35', '45', '70', '0', '75', '60', '40', '40', '5', '90',
        '80', '85', '15', '85', '0', '40', '55', '50', '65', '35', '0',
        '40', '50', '60', '40', '75', '85', '65', '0', '65', '70', '90',
        '10', '5', '0', '65', '65', '85', '65', '85', '5', '95', '90',
        '40', '0', '25', '5', '10', '5', '75', '35', '45', '75', '5',
        '15', '5', '20', '95', '40', '0', '80', '0', '20', '25', '20',
        '45', '0', '60', '40', '0', '85', '10', '90', '55', '0', '30',
        '80', '55', '75', '90', '45', '95', '60', '0', '85', '20', '25',
        '80', '50', '85', '40', '5', '15', '20', '5', '25', '15', '75',
        '70', '5', '100', '80', '5', '95', '30', '55', '85', '65', '100',
        '15', '25', '35', '95', '90', '80', '100', '70', '95', '85',
        '10', '60', '85']], dtype='<U21')
test_ndarray.T #학번이 string으로 들어가있어서 모든 자료가 string으로 되어있음.. 
array([['202212377', '65', '45', '0', '10'],
       ['202212473', '95', '30', '60', '10'],
       ['202212310', '65', '85', '15', '20'],
       ['202212460', '55', '35', '35', '5'],
       ['202212320', '80', '60', '55', '70'],
       ['202212329', '75', '40', '75', '85'],
       ['202212408', '65', '70', '60', '75'],
       ['202212319', '60', '25', '20', '35'],
       ['202212348', '95', '55', '65', '90'],
       ['202212306', '90', '25', '95', '50'],
       ['202212308', '55', '45', '75', '30'],
       ['202212366', '95', '60', '25', '55'],
       ['202212367', '95', '35', '0', '25'],
       ['202212461', '50', '55', '90', '45'],
       ['202212354', '50', '65', '50', '70'],
       ['202212361', '95', '100', '25', '40'],
       ['202212400', '50', '65', '35', '85'],
       ['202212490', '65', '85', '10', '5'],
       ['202212404', '70', '65', '65', '80'],
       ['202212326', '90', '70', '100', '30'],
       ['202212452', '80', '45', '80', '85'],
       ['202212362', '55', '45', '85', '70'],
       ['202212396', '65', '35', '45', '20'],
       ['202212356', '70', '25', '50', '70'],
       ['202212305', '85', '55', '30', '80'],
       ['202212398', '90', '30', '30', '0'],
       ['202212410', '100', '65', '50', '70'],
       ['202212385', '80', '70', '50', '100'],
       ['202212430', '80', '35', '25', '65'],
       ['202212498', '55', '75', '20', '25'],
       ['202212423', '75', '75', '85', '95'],
       ['202212327', '80', '95', '5', '5'],
       ['202212347', '95', '60', '65', '10'],
       ['202212483', '95', '60', '90', '75'],
       ['202212447', '100', '75', '70', '25'],
       ['202212496', '100', '55', '35', '85'],
       ['202212358', '80', '60', '65', '55'],
       ['202212399', '70', '80', '0', '10'],
       ['202212459', '85', '65', '60', '60'],
       ['202212313', '100', '95', '0', '25'],
       ['202212304', '95', '60', '15', '45'],
       ['202212431', '75', '40', '30', '10'],
       ['202212325', '70', '80', '50', '25'],
       ['202212471', '50', '45', '10', '10'],
       ['202212463', '100', '100', '100', '50'],
       ['202212441', '75', '50', '60', '5'],
       ['202212445', '85', '50', '35', '100'],
       ['202212323', '80', '35', '75', '80'],
       ['202212442', '95', '45', '35', '80'],
       ['202212346', '65', '85', '85', '15'],
       ['202212411', '90', '30', '25', '5'],
       ['202212468', '65', '65', '35', '70'],
       ['202212331', '80', '65', '30', '90'],
       ['202212345', '95', '80', '45', '35'],
       ['202212339', '65', '75', '50', '35'],
       ['202212383', '90', '55', '100', '30'],
       ['202212462', '95', '25', '95', '90'],
       ['202212344', '100', '50', '80', '10'],
       ['202212472', '50', '55', '35', '60'],
       ['202212437', '90', '70', '35', '25'],
       ['202212336', '50', '55', '15', '75'],
       ['202212438', '80', '50', '55', '90'],
       ['202212454', '50', '75', '65', '90'],
       ['202212384', '70', '40', '90', '5'],
       ['202212402', '65', '85', '20', '90'],
       ['202212397', '60', '30', '0', '50'],
       ['202212318', '50', '65', '15', '0'],
       ['202212371', '60', '95', '30', '70'],
       ['202212469', '70', '70', '5', '0'],
       ['202212379', '75', '45', '15', '75'],
       ['202212364', '50', '60', '15', '50'],
       ['202212450', '85', '90', '90', '90'],
       ['202212337', '80', '25', '85', '20'],
       ['202212458', '55', '75', '95', '90'],
       ['202212494', '85', '30', '45', '15'],
       ['202212478', '65', '30', '45', '15'],
       ['202212373', '85', '95', '35', '25'],
       ['202212474', '60', '25', '10', '50'],
       ['202212455', '95', '45', '90', '35'],
       ['202212317', '85', '50', '60', '45'],
       ['202212341', '60', '50', '100', '70'],
       ['202212386', '100', '75', '60', '0'],
       ['202212328', '100', '90', '85', '75'],
       ['202212417', '55', '100', '100', '60'],
       ['202212370', '70', '60', '30', '40'],
       ['202212486', '70', '90', '95', '40'],
       ['202212333', '55', '50', '0', '5'],
       ['202212360', '100', '100', '45', '90'],
       ['202212350', '85', '70', '90', '80'],
       ['202212382', '100', '85', '65', '85'],
       ['202212392', '60', '65', '35', '15'],
       ['202212449', '65', '75', '75', '85'],
       ['202212394', '65', '25', '40', '0'],
       ['202212444', '75', '75', '50', '40'],
       ['202212487', '50', '55', '80', '55'],
       ['202212425', '75', '30', '20', '50'],
       ['202212312', '100', '50', '25', '65'],
       ['202212448', '90', '30', '95', '35'],
       ['202212434', '55', '100', '80', '0'],
       ['202212451', '75', '60', '15', '40'],
       ['202212433', '60', '25', '25', '50'],
       ['202212424', '85', '35', '10', '60'],
       ['202212351', '60', '100', '55', '40'],
       ['202212324', '70', '55', '50', '75'],
       ['202212314', '80', '65', '95', '85'],
       ['202212446', '65', '35', '15', '65'],
       ['202212401', '85', '70', '100', '0'],
       ['202212307', '100', '30', '60', '65'],
       ['202212300', '65', '70', '55', '70'],
       ['202212342', '85', '55', '85', '90'],
       ['202212479', '85', '95', '80', '10'],
       ['202212443', '85', '70', '75', '5'],
       ['202212387', '100', '35', '70', '0'],
       ['202212372', '95', '45', '55', '65'],
       ['202212376', '95', '85', '40', '65'],
       ['202212466', '55', '50', '30', '85'],
       ['202212391', '85', '50', '5', '65'],
       ['202212368', '75', '90', '85', '85'],
       ['202212427', '95', '70', '10', '5'],
       ['202212414', '85', '35', '80', '95'],
       ['202212426', '95', '50', '80', '90'],
       ['202212316', '100', '65', '75', '40'],
       ['202212355', '95', '70', '70', '0'],
       ['202212477', '95', '70', '20', '25'],
       ['202212484', '100', '60', '10', '5'],
       ['202212456', '55', '35', '25', '10'],
       ['202212500', '60', '90', '40', '5'],
       ['202212381', '85', '90', '85', '75'],
       ['202212335', '75', '85', '25', '35'],
       ['202212475', '55', '30', '50', '45'],
       ['202212343', '70', '60', '75', '75'],
       ['202212412', '80', '30', '95', '5'],
       ['202212428', '90', '85', '80', '15'],
       ['202212330', '90', '25', '95', '5'],
       ['202212375', '60', '85', '50', '20'],
       ['202212413', '90', '50', '95', '95'],
       ['202212303', '75', '95', '65', '40'],
       ['202212374', '60', '40', '35', '0'],
       ['202212409', '55', '100', '15', '80'],
       ['202212440', '70', '75', '80', '0'],
       ['202212393', '75', '65', '25', '20'],
       ['202212492', '90', '75', '80', '25'],
       ['202212357', '50', '75', '75', '20'],
       ['202212465', '55', '45', '35', '45'],
       ['202212415', '90', '70', '90', '0'],
       ['202212405', '75', '30', '100', '60'],
       ['202212435', '90', '85', '0', '40'],
       ['202212380', '85', '70', '35', '0'],
       ['202212369', '100', '75', '100', '85'],
       ['202212467', '55', '35', '20', '10'],
       ['202212429', '70', '75', '90', '90'],
       ['202212495', '90', '90', '55', '55'],
       ['202212420', '55', '60', '40', '0'],
       ['202212302', '100', '90', '5', '30'],
       ['202212481', '50', '55', '25', '80'],
       ['202212422', '100', '100', '90', '55'],
       ['202212388', '70', '45', '70', '75'],
       ['202212480', '85', '95', '85', '90'],
       ['202212378', '55', '25', '95', '45'],
       ['202212457', '75', '30', '10', '95'],
       ['202212419', '65', '85', '15', '60'],
       ['202212432', '70', '90', '70', '0'],
       ['202212395', '60', '85', '70', '85'],
       ['202212464', '100', '25', '10', '20'],
       ['202212476', '75', '25', '80', '25'],
       ['202212332', '90', '95', '40', '80'],
       ['202212301', '95', '90', '50', '50'],
       ['202212497', '90', '90', '65', '85'],
       ['202212309', '95', '75', '50', '40'],
       ['202212493', '55', '60', '70', '5'],
       ['202212311', '95', '85', '0', '15'],
       ['202212416', '65', '60', '35', '20'],
       ['202212489', '65', '50', '5', '5'],
       ['202212359', '90', '25', '60', '25'],
       ['202212349', '100', '40', '40', '15'],
       ['202212403', '70', '25', '100', '75'],
       ['202212418', '100', '30', '70', '70'],
       ['202212406', '50', '55', '55', '5'],
       ['202212485', '70', '35', '70', '100'],
       ['202212390', '70', '60', '60', '80'],
       ['202212365', '55', '45', '90', '5'],
       ['202212338', '55', '55', '10', '95'],
       ['202212363', '65', '80', '10', '30'],
       ['202212321', '90', '25', '35', '55'],
       ['202212499', '100', '30', '30', '85'],
       ['202212340', '70', '85', '70', '65'],
       ['202212421', '60', '100', '45', '100'],
       ['202212407', '70', '25', '100', '15'],
       ['202212439', '70', '35', '80', '25'],
       ['202212488', '65', '60', '30', '35'],
       ['202212453', '95', '35', '40', '95'],
       ['202212482', '50', '80', '65', '90'],
       ['202212334', '100', '40', '80', '80'],
       ['202212322', '55', '30', '95', '100'],
       ['202212353', '65', '40', '65', '70'],
       ['202212491', '55', '70', '40', '95'],
       ['202212352', '65', '85', '25', '85'],
       ['202212315', '85', '85', '100', '10'],
       ['202212470', '80', '65', '35', '60'],
       ['202212436', '50', '95', '45', '85']], dtype='<U21')
test_ndarray
array([[202212377,        65,        45,         0,        10],
       [202212473,        95,        30,        60,        10],
       [202212310,        65,        85,        15,        20],
       [202212460,        55,        35,        35,         5],
       [202212320,        80,        60,        55,        70],
       [202212329,        75,        40,        75,        85],
       [202212408,        65,        70,        60,        75],
       [202212319,        60,        25,        20,        35],
       [202212348,        95,        55,        65,        90],
       [202212306,        90,        25,        95,        50],
       [202212308,        55,        45,        75,        30],
       [202212366,        95,        60,        25,        55],
       [202212367,        95,        35,         0,        25],
       [202212461,        50,        55,        90,        45],
       [202212354,        50,        65,        50,        70],
       [202212361,        95,       100,        25,        40],
       [202212400,        50,        65,        35,        85],
       [202212490,        65,        85,        10,         5],
       [202212404,        70,        65,        65,        80],
       [202212326,        90,        70,       100,        30],
       [202212452,        80,        45,        80,        85],
       [202212362,        55,        45,        85,        70],
       [202212396,        65,        35,        45,        20],
       [202212356,        70,        25,        50,        70],
       [202212305,        85,        55,        30,        80],
       [202212398,        90,        30,        30,         0],
       [202212410,       100,        65,        50,        70],
       [202212385,        80,        70,        50,       100],
       [202212430,        80,        35,        25,        65],
       [202212498,        55,        75,        20,        25],
       [202212423,        75,        75,        85,        95],
       [202212327,        80,        95,         5,         5],
       [202212347,        95,        60,        65,        10],
       [202212483,        95,        60,        90,        75],
       [202212447,       100,        75,        70,        25],
       [202212496,       100,        55,        35,        85],
       [202212358,        80,        60,        65,        55],
       [202212399,        70,        80,         0,        10],
       [202212459,        85,        65,        60,        60],
       [202212313,       100,        95,         0,        25],
       [202212304,        95,        60,        15,        45],
       [202212431,        75,        40,        30,        10],
       [202212325,        70,        80,        50,        25],
       [202212471,        50,        45,        10,        10],
       [202212463,       100,       100,       100,        50],
       [202212441,        75,        50,        60,         5],
       [202212445,        85,        50,        35,       100],
       [202212323,        80,        35,        75,        80],
       [202212442,        95,        45,        35,        80],
       [202212346,        65,        85,        85,        15],
       [202212411,        90,        30,        25,         5],
       [202212468,        65,        65,        35,        70],
       [202212331,        80,        65,        30,        90],
       [202212345,        95,        80,        45,        35],
       [202212339,        65,        75,        50,        35],
       [202212383,        90,        55,       100,        30],
       [202212462,        95,        25,        95,        90],
       [202212344,       100,        50,        80,        10],
       [202212472,        50,        55,        35,        60],
       [202212437,        90,        70,        35,        25],
       [202212336,        50,        55,        15,        75],
       [202212438,        80,        50,        55,        90],
       [202212454,        50,        75,        65,        90],
       [202212384,        70,        40,        90,         5],
       [202212402,        65,        85,        20,        90],
       [202212397,        60,        30,         0,        50],
       [202212318,        50,        65,        15,         0],
       [202212371,        60,        95,        30,        70],
       [202212469,        70,        70,         5,         0],
       [202212379,        75,        45,        15,        75],
       [202212364,        50,        60,        15,        50],
       [202212450,        85,        90,        90,        90],
       [202212337,        80,        25,        85,        20],
       [202212458,        55,        75,        95,        90],
       [202212494,        85,        30,        45,        15],
       [202212478,        65,        30,        45,        15],
       [202212373,        85,        95,        35,        25],
       [202212474,        60,        25,        10,        50],
       [202212455,        95,        45,        90,        35],
       [202212317,        85,        50,        60,        45],
       [202212341,        60,        50,       100,        70],
       [202212386,       100,        75,        60,         0],
       [202212328,       100,        90,        85,        75],
       [202212417,        55,       100,       100,        60],
       [202212370,        70,        60,        30,        40],
       [202212486,        70,        90,        95,        40],
       [202212333,        55,        50,         0,         5],
       [202212360,       100,       100,        45,        90],
       [202212350,        85,        70,        90,        80],
       [202212382,       100,        85,        65,        85],
       [202212392,        60,        65,        35,        15],
       [202212449,        65,        75,        75,        85],
       [202212394,        65,        25,        40,         0],
       [202212444,        75,        75,        50,        40],
       [202212487,        50,        55,        80,        55],
       [202212425,        75,        30,        20,        50],
       [202212312,       100,        50,        25,        65],
       [202212448,        90,        30,        95,        35],
       [202212434,        55,       100,        80,         0],
       [202212451,        75,        60,        15,        40],
       [202212433,        60,        25,        25,        50],
       [202212424,        85,        35,        10,        60],
       [202212351,        60,       100,        55,        40],
       [202212324,        70,        55,        50,        75],
       [202212314,        80,        65,        95,        85],
       [202212446,        65,        35,        15,        65],
       [202212401,        85,        70,       100,         0],
       [202212307,       100,        30,        60,        65],
       [202212300,        65,        70,        55,        70],
       [202212342,        85,        55,        85,        90],
       [202212479,        85,        95,        80,        10],
       [202212443,        85,        70,        75,         5],
       [202212387,       100,        35,        70,         0],
       [202212372,        95,        45,        55,        65],
       [202212376,        95,        85,        40,        65],
       [202212466,        55,        50,        30,        85],
       [202212391,        85,        50,         5,        65],
       [202212368,        75,        90,        85,        85],
       [202212427,        95,        70,        10,         5],
       [202212414,        85,        35,        80,        95],
       [202212426,        95,        50,        80,        90],
       [202212316,       100,        65,        75,        40],
       [202212355,        95,        70,        70,         0],
       [202212477,        95,        70,        20,        25],
       [202212484,       100,        60,        10,         5],
       [202212456,        55,        35,        25,        10],
       [202212500,        60,        90,        40,         5],
       [202212381,        85,        90,        85,        75],
       [202212335,        75,        85,        25,        35],
       [202212475,        55,        30,        50,        45],
       [202212343,        70,        60,        75,        75],
       [202212412,        80,        30,        95,         5],
       [202212428,        90,        85,        80,        15],
       [202212330,        90,        25,        95,         5],
       [202212375,        60,        85,        50,        20],
       [202212413,        90,        50,        95,        95],
       [202212303,        75,        95,        65,        40],
       [202212374,        60,        40,        35,         0],
       [202212409,        55,       100,        15,        80],
       [202212440,        70,        75,        80,         0],
       [202212393,        75,        65,        25,        20],
       [202212492,        90,        75,        80,        25],
       [202212357,        50,        75,        75,        20],
       [202212465,        55,        45,        35,        45],
       [202212415,        90,        70,        90,         0],
       [202212405,        75,        30,       100,        60],
       [202212435,        90,        85,         0,        40],
       [202212380,        85,        70,        35,         0],
       [202212369,       100,        75,       100,        85],
       [202212467,        55,        35,        20,        10],
       [202212429,        70,        75,        90,        90],
       [202212495,        90,        90,        55,        55],
       [202212420,        55,        60,        40,         0],
       [202212302,       100,        90,         5,        30],
       [202212481,        50,        55,        25,        80],
       [202212422,       100,       100,        90,        55],
       [202212388,        70,        45,        70,        75],
       [202212480,        85,        95,        85,        90],
       [202212378,        55,        25,        95,        45],
       [202212457,        75,        30,        10,        95],
       [202212419,        65,        85,        15,        60],
       [202212432,        70,        90,        70,         0],
       [202212395,        60,        85,        70,        85],
       [202212464,       100,        25,        10,        20],
       [202212476,        75,        25,        80,        25],
       [202212332,        90,        95,        40,        80],
       [202212301,        95,        90,        50,        50],
       [202212497,        90,        90,        65,        85],
       [202212309,        95,        75,        50,        40],
       [202212493,        55,        60,        70,         5],
       [202212311,        95,        85,         0,        15],
       [202212416,        65,        60,        35,        20],
       [202212489,        65,        50,         5,         5],
       [202212359,        90,        25,        60,        25],
       [202212349,       100,        40,        40,        15],
       [202212403,        70,        25,       100,        75],
       [202212418,       100,        30,        70,        70],
       [202212406,        50,        55,        55,         5],
       [202212485,        70,        35,        70,       100],
       [202212390,        70,        60,        60,        80],
       [202212365,        55,        45,        90,         5],
       [202212338,        55,        55,        10,        95],
       [202212363,        65,        80,        10,        30],
       [202212321,        90,        25,        35,        55],
       [202212499,       100,        30,        30,        85],
       [202212340,        70,        85,        70,        65],
       [202212421,        60,       100,        45,       100],
       [202212407,        70,        25,       100,        15],
       [202212439,        70,        35,        80,        25],
       [202212488,        65,        60,        30,        35],
       [202212453,        95,        35,        40,        95],
       [202212482,        50,        80,        65,        90],
       [202212334,       100,        40,        80,        80],
       [202212322,        55,        30,        95,       100],
       [202212353,        65,        40,        65,        70],
       [202212491,        55,        70,        40,        95],
       [202212352,        65,        85,        25,        85],
       [202212315,        85,        85,       100,        10],
       [202212470,        80,        65,        35,        60],
       [202212436,        50,        95,        45,        85]])

(풀이2)

test_ndarray[test_ndarray[:,0] == 202212460]
array([[202212460,        55,        35,        35,         5]])
test_ndarray[test_ndarray[:,0] == 202212460,1]   # 이게뭐여? 가독성이 떨어짐
array([55])

(풀이2)가 (풀이1)에 비하여 불편한 점 - test_ndarray의 첫칼럼은 student id이고 두번째 칼럼은 att라는 사실을 암기하고 있어야 한다. - student id가 아니고 만약에 학생이름(문자형)을 써서 데이터를 정리한다면 모든 자료형은 문자형이 되어야 한다. - 작성한 코드 가독성이 없다. (위치로 접근하므로)

- 요약: hash 스타일로 정보를 추출하는 것이 유용할 때가 있다. 그리고 보통 hash 스타일로 정보를 뽑는것이 유리함.

numpy는 정보추출을 위해 개발된 자료형이 아니라 행렬 및 벡터의 수학연산을 지원하기 위해 개발된 자료형이다.)

- 소망: 정보를 추출할때는 hash 스타일도 유용하다는 것은 ㅇㅋ 하지만 난 가끔 넘파이스타일로 정보를 뽑고 싶엉. 그리고 딕셔너리 형태가 아니고 엑셀처럼(행렬처럼) 데이터를 보고 ㅅ피다!!! pandas 개발

pandas 개발동기

엑셀처럼 데이터를 테이블 형태로 정리하고 싶다.

np.random.seed(43052)
att = np.random.choice(np.arange(10,21)*5,20)
rep = np.random.choice(np.arange(5,21)*5,20)
mid = np.random.choice(np.arange(0,21)*5,20)
fin = np.random.choice(np.arange(0,21)*5,20)
key = ['202212'+str(s) for s in np.random.choice(np.arange(300,501),20,replace=False)]
test_dic = {key[i] : {'att':att[i], 'rep':rep[i], 'mid':mid[i], 'fin':fin[i]} for i in range(20)}
test_ndarray = np.array([key,att,rep,mid,fin],dtype=np.int64).T
test_dic
{'202212380': {'att': 65, 'rep': 55, 'mid': 50, 'fin': 40},
 '202212370': {'att': 95, 'rep': 100, 'mid': 50, 'fin': 80},
 '202212363': {'att': 65, 'rep': 90, 'mid': 60, 'fin': 30},
 '202212488': {'att': 55, 'rep': 80, 'mid': 75, 'fin': 80},
 '202212312': {'att': 80, 'rep': 30, 'mid': 30, 'fin': 100},
 '202212377': {'att': 75, 'rep': 40, 'mid': 100, 'fin': 15},
 '202212463': {'att': 65, 'rep': 45, 'mid': 45, 'fin': 90},
 '202212471': {'att': 60, 'rep': 60, 'mid': 25, 'fin': 0},
 '202212400': {'att': 95, 'rep': 65, 'mid': 20, 'fin': 10},
 '202212469': {'att': 90, 'rep': 80, 'mid': 80, 'fin': 20},
 '202212318': {'att': 55, 'rep': 75, 'mid': 35, 'fin': 25},
 '202212432': {'att': 95, 'rep': 95, 'mid': 45, 'fin': 0},
 '202212443': {'att': 95, 'rep': 55, 'mid': 15, 'fin': 35},
 '202212367': {'att': 50, 'rep': 80, 'mid': 40, 'fin': 30},
 '202212458': {'att': 50, 'rep': 55, 'mid': 15, 'fin': 85},
 '202212396': {'att': 95, 'rep': 30, 'mid': 30, 'fin': 95},
 '202212482': {'att': 50, 'rep': 50, 'mid': 45, 'fin': 10},
 '202212452': {'att': 65, 'rep': 55, 'mid': 15, 'fin': 45},
 '202212387': {'att': 70, 'rep': 70, 'mid': 40, 'fin': 35},
 '202212354': {'att': 90, 'rep': 90, 'mid': 80, 'fin': 90}}
  • 테이블 형태로 보고 싶다.

(방법1) - 행렬이기는 하지만 방법 2,3,4에 비하여 우리가 원하는 만큼 가독성을 주는 형태는 아님

test_ndarray = np.array([key,att,rep,mid,fin],dtype=np.int64).T
test_ndarray
array([[202212380,        65,        55,        50,        40],
       [202212370,        95,       100,        50,        80],
       [202212363,        65,        90,        60,        30],
       [202212488,        55,        80,        75,        80],
       [202212312,        80,        30,        30,       100],
       [202212377,        75,        40,       100,        15],
       [202212463,        65,        45,        45,        90],
       [202212471,        60,        60,        25,         0],
       [202212400,        95,        65,        20,        10],
       [202212469,        90,        80,        80,        20],
       [202212318,        55,        75,        35,        25],
       [202212432,        95,        95,        45,         0],
       [202212443,        95,        55,        15,        35],
       [202212367,        50,        80,        40,        30],
       [202212458,        50,        55,        15,        85],
       [202212396,        95,        30,        30,        95],
       [202212482,        50,        50,        45,        10],
       [202212452,        65,        55,        15,        45],
       [202212387,        70,        70,        40,        35],
       [202212354,        90,        90,        80,        90]])

(방법2)

pd.DataFrame(test_dic).T
att rep mid fin
202212380 65 55 50 40
202212370 95 100 50 80
202212363 65 90 60 30
202212488 55 80 75 80
202212312 80 30 30 100
202212377 75 40 100 15
202212463 65 45 45 90
202212471 60 60 25 0
202212400 95 65 20 10
202212469 90 80 80 20
202212318 55 75 35 25
202212432 95 95 45 0
202212443 95 55 15 35
202212367 50 80 40 30
202212458 50 55 15 85
202212396 95 30 30 95
202212482 50 50 45 10
202212452 65 55 15 45
202212387 70 70 40 35
202212354 90 90 80 90

(방법3)

test_dic2 = {'att':{key[i]:att[i] for i in range(20)},
             'rep':{key[i]:rep[i] for i in range(20)},
             'mid':{key[i]:mid[i] for i in range(20)},
             'fin':{key[i]:fin[i] for i in range(20)}}
test_dic2
{'att': {'202212380': 65,
  '202212370': 95,
  '202212363': 65,
  '202212488': 55,
  '202212312': 80,
  '202212377': 75,
  '202212463': 65,
  '202212471': 60,
  '202212400': 95,
  '202212469': 90,
  '202212318': 55,
  '202212432': 95,
  '202212443': 95,
  '202212367': 50,
  '202212458': 50,
  '202212396': 95,
  '202212482': 50,
  '202212452': 65,
  '202212387': 70,
  '202212354': 90},
 'rep': {'202212380': 55,
  '202212370': 100,
  '202212363': 90,
  '202212488': 80,
  '202212312': 30,
  '202212377': 40,
  '202212463': 45,
  '202212471': 60,
  '202212400': 65,
  '202212469': 80,
  '202212318': 75,
  '202212432': 95,
  '202212443': 55,
  '202212367': 80,
  '202212458': 55,
  '202212396': 30,
  '202212482': 50,
  '202212452': 55,
  '202212387': 70,
  '202212354': 90},
 'mid': {'202212380': 50,
  '202212370': 50,
  '202212363': 60,
  '202212488': 75,
  '202212312': 30,
  '202212377': 100,
  '202212463': 45,
  '202212471': 25,
  '202212400': 20,
  '202212469': 80,
  '202212318': 35,
  '202212432': 45,
  '202212443': 15,
  '202212367': 40,
  '202212458': 15,
  '202212396': 30,
  '202212482': 45,
  '202212452': 15,
  '202212387': 40,
  '202212354': 80},
 'fin': {'202212380': 40,
  '202212370': 80,
  '202212363': 30,
  '202212488': 80,
  '202212312': 100,
  '202212377': 15,
  '202212463': 90,
  '202212471': 0,
  '202212400': 10,
  '202212469': 20,
  '202212318': 25,
  '202212432': 0,
  '202212443': 35,
  '202212367': 30,
  '202212458': 85,
  '202212396': 95,
  '202212482': 10,
  '202212452': 45,
  '202212387': 35,
  '202212354': 90}}
pd.DataFrame(test_dic2)
att rep mid fin
202212380 65 55 50 40
202212370 95 100 50 80
202212363 65 90 60 30
202212488 55 80 75 80
202212312 80 30 30 100
202212377 75 40 100 15
202212463 65 45 45 90
202212471 60 60 25 0
202212400 95 65 20 10
202212469 90 80 80 20
202212318 55 75 35 25
202212432 95 95 45 0
202212443 95 55 15 35
202212367 50 80 40 30
202212458 50 55 15 85
202212396 95 30 30 95
202212482 50 50 45 10
202212452 65 55 15 45
202212387 70 70 40 35
202212354 90 90 80 90

(방법4)

df = pd.DataFrame({'att':att, 'rep':rep, 'mid':mid, 'fin':fin}, index=key)
df
att rep mid fin
202212380 65 55 50 40
202212370 95 100 50 80
202212363 65 90 60 30
202212488 55 80 75 80
202212312 80 30 30 100
202212377 75 40 100 15
202212463 65 45 45 90
202212471 60 60 25 0
202212400 95 65 20 10
202212469 90 80 80 20
202212318 55 75 35 25
202212432 95 95 45 0
202212443 95 55 15 35
202212367 50 80 40 30
202212458 50 55 15 85
202212396 95 30 30 95
202212482 50 50 45 10
202212452 65 55 15 45
202212387 70 70 40 35
202212354 90 90 80 90

(방법5)

df = pd.DataFrame({'att':att, 'rep':rep, 'mid':mid, 'fin':fin})
df
att rep mid fin
0 65 55 50 40
1 95 100 50 80
2 65 90 60 30
3 55 80 75 80
4 80 30 30 100
5 75 40 100 15
6 65 45 45 90
7 60 60 25 0
8 95 65 20 10
9 90 80 80 20
10 55 75 35 25
11 95 95 45 0
12 95 55 15 35
13 50 80 40 30
14 50 55 15 85
15 95 30 30 95
16 50 50 45 10
17 65 55 15 45
18 70 70 40 35
19 90 90 80 90
df=df.set_index([key])   #인덱스를 set_index로 설정해줄 수 있음
df
att rep mid fin
202212380 65 55 50 40
202212370 95 100 50 80
202212363 65 90 60 30
202212488 55 80 75 80
202212312 80 30 30 100
202212377 75 40 100 15
202212463 65 45 45 90
202212471 60 60 25 0
202212400 95 65 20 10
202212469 90 80 80 20
202212318 55 75 35 25
202212432 95 95 45 0
202212443 95 55 15 35
202212367 50 80 40 30
202212458 50 55 15 85
202212396 95 30 30 95
202212482 50 50 45 10
202212452 65 55 15 45
202212387 70 70 40 35
202212354 90 90 80 90

해싱으로 원하는 정보를 뽑으면 좋겠다. (마치 딕셔너리처럼)

- 예제1: 출설점수를 출력

test_dic2['att']
{'202212380': 65,
 '202212370': 95,
 '202212363': 65,
 '202212488': 55,
 '202212312': 80,
 '202212377': 75,
 '202212463': 65,
 '202212471': 60,
 '202212400': 95,
 '202212469': 90,
 '202212318': 55,
 '202212432': 95,
 '202212443': 95,
 '202212367': 50,
 '202212458': 50,
 '202212396': 95,
 '202212482': 50,
 '202212452': 65,
 '202212387': 70,
 '202212354': 90}
df['att']
202212380    65
202212370    95
202212363    65
202212488    55
202212312    80
202212377    75
202212463    65
202212471    60
202212400    95
202212469    90
202212318    55
202212432    95
202212443    95
202212367    50
202212458    50
202212396    95
202212482    50
202212452    65
202212387    70
202212354    90
Name: att, dtype: int64

- 예제2 : 학번 202212380의 출석점수 출력

test_dic2['att']['202212380']
65
df['att']['202212380']
65

인덱싱으로 정보를 뽑는 기능도 지원을 하면 좋겠따. (마치 리스트나 넘파이처럼)

- 예제1: 첫번째 학생의 기말고사 성적을 출력하고 싶다.

test_ndarray[0,-1]
40
df.iloc[0,-1]  
40
  • 벼락치기: df에서 iloc라는 특수기능을 이용하면 넘파이 인덱싱처럼 원소출력이 가능하다.

-예제2: 홀수번째 학생의 점수를 뽑고 싶다.

test_ndarray[::2]
array([[202212380,        65,        55,        50,        40],
       [202212363,        65,        90,        60,        30],
       [202212312,        80,        30,        30,       100],
       [202212463,        65,        45,        45,        90],
       [202212400,        95,        65,        20,        10],
       [202212318,        55,        75,        35,        25],
       [202212443,        95,        55,        15,        35],
       [202212458,        50,        55,        15,        85],
       [202212482,        50,        50,        45,        10],
       [202212387,        70,        70,        40,        35]])
df.iloc[0::2]
att rep mid fin
202212380 65 55 50 40
202212363 65 90 60 30
202212312 80 30 30 100
202212463 65 45 45 90
202212400 95 65 20 10
202212318 55 75 35 25
202212443 95 55 15 35
202212458 50 55 15 85
202212482 50 50 45 10
202212387 70 70 40 35

- 예제3: 맨 끝에서 3명의 점수를 출력하고 싶다.

test_ndarray[-3:]
array([[202212452,        65,        55,        15,        45],
       [202212387,        70,        70,        40,        35],
       [202212354,        90,        90,        80,        90]])
df.iloc[-3:]
att rep mid fin
202212452 65 55 15 45
202212387 70 70 40 35
202212354 90 90 80 90

- 예제4: 맨 끝에서 3명의 점수를 마지막 2개의 칼럼만 출력하고 싶다.

test_ndarray[-3:,-2:]
array([[15, 45],
       [40, 35],
       [80, 90]])
df.iloc[-3:,-2:]
mid fin
202212452 15 45
202212387 40 35
202212354 80 90

궁극: 해싱과 인덱싱을 모두 지원하는 아주 우수한 자료형을 만들고 싶어!

- 예제1: 중간고사 점수가 20점 이상이면서 동시에 출석점수가 60점미만인 학생들의 기말고사 점수를 출력

(방법1) 데이터베이스 스타일

df.query("mid>=20 and att<60")
att rep mid fin
202212488 55 80 75 80
202212318 55 75 35 25
202212367 50 80 40 30
202212482 50 50 45 10
df.query("mid>=20 and att<60")['fin']
202212488    80
202212318    25
202212367    30
202212482    10
Name: fin, dtype: int64

(방법2) 넘파이 스타일

test_ndarray
array([[202212380,        65,        55,        50,        40],
       [202212370,        95,       100,        50,        80],
       [202212363,        65,        90,        60,        30],
       [202212488,        55,        80,        75,        80],
       [202212312,        80,        30,        30,       100],
       [202212377,        75,        40,       100,        15],
       [202212463,        65,        45,        45,        90],
       [202212471,        60,        60,        25,         0],
       [202212400,        95,        65,        20,        10],
       [202212469,        90,        80,        80,        20],
       [202212318,        55,        75,        35,        25],
       [202212432,        95,        95,        45,         0],
       [202212443,        95,        55,        15,        35],
       [202212367,        50,        80,        40,        30],
       [202212458,        50,        55,        15,        85],
       [202212396,        95,        30,        30,        95],
       [202212482,        50,        50,        45,        10],
       [202212452,        65,        55,        15,        45],
       [202212387,        70,        70,        40,        35],
       [202212354,        90,        90,        80,        90]])
test_ndarray[:,3]>=20 # 중간고사 점수 20점 이상
array([ True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True, False,  True, False,  True,  True, False,
        True,  True])
test_ndarray[:,1] < 60 # 출석이 60미만 
array([False, False, False,  True, False, False, False, False, False,
       False,  True, False, False,  True,  True, False,  True, False,
       False, False])
(test_ndarray[:,3]>=20) & (test_ndarray[:,1] < 60)
array([False, False, False,  True, False, False, False, False, False,
       False,  True, False, False,  True, False, False,  True, False,
       False, False])

note: test_ndarray[:,3]>=20 & test_ndarray[:,1] < 60 와 같이 하면 에러가 난다. 가로로 묶어줘야 함

test_ndarray[(test_ndarray[:,3]>=20) & (test_ndarray[:,1] < 60),-1] 
array([80, 25, 30, 10])
  • 구현난이도 어려움, 가독성 꽝

- 예제2: ’중간고사점수<기말고사점수’인 학생들의 출석점수 평균을 구하자.

df.query('mid<fin')['att'].mean()
76.66666666666667

pandas 사용법

pandas 공부 1단계

데이터프레임 선언

- 방법1: dictionary에서 만든다.

pd.DataFrame({'att':[30,40,50],'mid':[50,60,70]})  # 리스트
att mid
0 30 50
1 40 60
2 50 70
pd.DataFrame({'att':(30,40,50),'mid':(50,60,70)})  # 튜플
att mid
0 30 50
1 40 60
2 50 70
pd.DataFrame({'att':np.array([30,40,50]),'mid':np.array([50,60,70])})  # 넘파이어레이
att mid
0 30 50
1 40 60
2 50 70

- 방법2: 2차원 ndarray에서 형태변환 통해 만든다.

np.arange(2*3).reshape(2,3)
array([[0, 1, 2],
       [3, 4, 5]])
pd.DataFrame(np.arange(2*3).reshape(2,3))
0 1 2
0 0 1 2
1 3 4 5

열의 이름 부여

- 방법1: 딕셔너리를 통하여 만들면 딕셔너리의 key가 자동으로 열의 이름이 된다.

pd.DataFrame({'att':np.array([30,40,50]),'mid':np.array([50,60,70])})
att mid
0 30 50
1 40 60
2 50 70

- 방법2: pd.DataFrame()의 옵션에 columns를 이용

pd.DataFrame(np.arange(2*3).reshape(2,3),columns=['X1','X2','X3'])
X1 X2 X3
0 0 1 2
1 3 4 5

- 방법3: df.columns에 원하는 열이름을 덮어씀(1)

df=pd.DataFrame(np.arange(2*3).reshape(2,3))
df
0 1 2
0 0 1 2
1 3 4 5
df.columns
RangeIndex(start=0, stop=3, step=1)
df.columns=['X1','X2','X3']
df
X1 X2 X3
0 0 1 2
1 3 4 5

- 방법4: df.columns에 원하는 열이름을 덮어씀(2)

df.columns=pd.Index(['X1','X2','X3'])  # 위와 같은 코드
df
X1 X2 X3
0 0 1 2
1 3 4 5

방법4가 방법3의 방식보다 컴퓨터가 이해하기 좋다. (=불필요한 에러를 방지할 수 있다.)

df.columns
Index(['X1', 'X2', 'X3'], dtype='object')
type(df.columns)
pandas.core.indexes.base.Index
['X1','X2','X3'], type(['X1','X2','X3'])
(['X1', 'X2', 'X3'], list)
pd.Index(['X1','X2','X3'])
Index(['X1', 'X2', 'X3'], dtype='object')

행의 이름 부여

- 방법1: 중첩 dict이면 nested dic의 key가 알아서 행의 이름으로 된다. (안쪽..)

{'att': {'boram':30, 'iu':40, 'hynn':50}, 'mid':{'boram':5, 'iu':45, 'hynn':90}}
{'att': {'boram': 30, 'iu': 40, 'hynn': 50},
 'mid': {'boram': 5, 'iu': 45, 'hynn': 90}}
pd.DataFrame({'att': {'boram':30, 'iu':40, 'hynn':50}, 'mid':{'boram':5, 'iu':45, 'hynn':90}})
att mid
boram 30 5
iu 40 45
hynn 50 90

- 방법2: index옵션 이용

pd.DataFrame({'att': [30, 40,50], 'mid':[5,45, 90]}, index=['boram','iu','hynn'])
att mid
boram 30 5
iu 40 45
hynn 50 90

- 방법3: df.index에 덮어씌움

df=pd.DataFrame({'att': [30, 40,50], 'mid':[5,45, 90]})
df
att mid
0 30 5
1 40 45
2 50 90
df.index
RangeIndex(start=0, stop=3, step=1)
df.index=['boram','iu','hynn']
df
att mid
boram 30 5
iu 40 45
hynn 50 90
df.index= pd.Index(['boram','iu','hynn'])  #이게 컴퓨터가 볼 때 더 안전한 코드!
df
att mid
boram 30 5
iu 40 45
hynn 50 90

- 방법4: df.set_index() 를 이용하여 덮어 씌운다.

df=pd.DataFrame({'att': [30, 40,50], 'mid':[5,45, 90]})
df
att mid
0 30 5
1 40 45
2 50 90
df.set_index(pd.Index(['boram','iu','hynn']))
att mid
boram 30 5
iu 40 45
hynn 50 90

(주의) 아래는 에러가 난다.

df.set_index(['boram','iu','hynn'])
KeyError: "None of ['boram', 'iu', 'hynn'] are in the columns"
df.set_index([['boram','iu','hynn']]) # 꺽쇠를 한번 더 넣어주면 에러를 피할 수 있따.
att mid
boram 30 5
iu 40 45
hynn 50 90

자료형, len, shape, for문의 반복변수

df= pd.DataFrame({'att':[30,40,50],'mid':[5,45,90]})
df
att mid
0 30 5
1 40 45
2 50 90

- type

type(df)
pandas.core.frame.DataFrame

- len

len(df) #row의 갯수
3

- shape

df.shape
(3, 2)

- for문의 반복변수

for k in df:
    print(k) #딕셔너리와 같다.
att
mid
for k in {'att':[30,40,50],'mid':[5,45,90]}: #딕셔너리
    print(k)
att
mid

pd.Series

- 2차원 ndarray가 pd.DataFrame에 대응한다면 1차원 ndarray는 pd.Series에 대응한다.

a=pd.Series(np.random.randn(10))
a
0    0.106173
1    0.723759
2    0.217990
3    0.194022
4   -0.688990
5   -0.351670
6    0.990933
7    1.212147
8   -0.608965
9    0.032549
dtype: float64
type(a)
pandas.core.series.Series
len(a)
10
a.shape
(10,)
for value in a:
    print(value)   #값들이 반복 넘파이어레이처럼..
0.10617283591748639
0.7237590624253404
0.21798967912700873
0.1940223087322443
-0.6889899757985083
-0.3516696436204985
0.9909329773184973
1.2121468150185186
-0.6089654373693767
0.03254898346416765

pandas공부 2단계

- 데이터

np.random.seed(43052)
att = np.random.choice(np.arange(10,21)*5,20)
rep = np.random.choice(np.arange(5,21)*5,20)
mid = np.random.choice(np.arange(0,21)*5,20)
fin = np.random.choice(np.arange(0,21)*5,20)
key = ['202212'+str(s) for s in np.random.choice(np.arange(300,501),20,replace=False)]
df=pd.DataFrame({'att':att, 'rep':rep, 'mid':mid, 'fin':fin}, index=key)
df
att rep mid fin
202212380 65 55 50 40
202212370 95 100 50 80
202212363 65 90 60 30
202212488 55 80 75 80
202212312 80 30 30 100
202212377 75 40 100 15
202212463 65 45 45 90
202212471 60 60 25 0
202212400 95 65 20 10
202212469 90 80 80 20
202212318 55 75 35 25
202212432 95 95 45 0
202212443 95 55 15 35
202212367 50 80 40 30
202212458 50 55 15 85
202212396 95 30 30 95
202212482 50 50 45 10
202212452 65 55 15 45
202212387 70 70 40 35
202212354 90 90 80 90

첫번째 칼럼을 선택

- 방법1

df.att 
202212380    65
202212370    95
202212363    65
202212488    55
202212312    80
202212377    75
202212463    65
202212471    60
202212400    95
202212469    90
202212318    55
202212432    95
202212443    95
202212367    50
202212458    50
202212396    95
202212482    50
202212452    65
202212387    70
202212354    90
Name: att, dtype: int64

- 방법2 : dict스타일

df['att']
202212380    65
202212370    95
202212363    65
202212488    55
202212312    80
202212377    75
202212463    65
202212471    60
202212400    95
202212469    90
202212318    55
202212432    95
202212443    95
202212367    50
202212458    50
202212396    95
202212482    50
202212452    65
202212387    70
202212354    90
Name: att, dtype: int64
type(df['att'])
pandas.core.series.Series

- 방법3 : dict스타일

df[['att']]
att
202212380 65
202212370 95
202212363 65
202212488 55
202212312 80
202212377 75
202212463 65
202212471 60
202212400 95
202212469 90
202212318 55
202212432 95
202212443 95
202212367 50
202212458 50
202212396 95
202212482 50
202212452 65
202212387 70
202212354 90
type(df[['att']])
pandas.core.frame.DataFrame
  • df.att 나 df[‘att’]는 series를 리턴하고 df[[‘att’]]는 dataframe을 리턴한다.

- 방법4 : ndarray 스타일

df.iloc[:,0] 
202212380    65
202212370    95
202212363    65
202212488    55
202212312    80
202212377    75
202212463    65
202212471    60
202212400    95
202212469    90
202212318    55
202212432    95
202212443    95
202212367    50
202212458    50
202212396    95
202212482    50
202212452    65
202212387    70
202212354    90
Name: att, dtype: int64

- 방법5: ndarray스타일

df.iloc[:,[0]]
att
202212380 65
202212370 95
202212363 65
202212488 55
202212312 80
202212377 75
202212463 65
202212471 60
202212400 95
202212469 90
202212318 55
202212432 95
202212443 95
202212367 50
202212458 50
202212396 95
202212482 50
202212452 65
202212387 70
202212354 90
  • df.iloc[:,0]은 series를 리턴하고 df.iloc[:,[0]]은 dataframe을 리턴한다.

- 방법6: ndarray 스타일과 dict 스타일의 혼합

df.loc[:,'att'] 
202212380    65
202212370    95
202212363    65
202212488    55
202212312    80
202212377    75
202212463    65
202212471    60
202212400    95
202212469    90
202212318    55
202212432    95
202212443    95
202212367    50
202212458    50
202212396    95
202212482    50
202212452    65
202212387    70
202212354    90
Name: att, dtype: int64

- 방법7: ndarray 스타일과 dict 스타일의 혼합

df.loc[:,['att']] 
att
202212380 65
202212370 95
202212363 65
202212488 55
202212312 80
202212377 75
202212463 65
202212471 60
202212400 95
202212469 90
202212318 55
202212432 95
202212443 95
202212367 50
202212458 50
202212396 95
202212482 50
202212452 65
202212387 70
202212354 90
  • df.loc[:,‘att’]은 series를 리턴하고 df.loc[:,[‘att’]]은 dataframe을 리턴한다.

- 방법7: nparray 스타일 + bool 인덱싱

df.iloc[:,[True,False,False,False]]
att
202212380 65
202212370 95
202212363 65
202212488 55
202212312 80
202212377 75
202212463 65
202212471 60
202212400 95
202212469 90
202212318 55
202212432 95
202212443 95
202212367 50
202212458 50
202212396 95
202212482 50
202212452 65
202212387 70
202212354 90

- 방법8: ndarray와 dict의 홉합형 + bool 인덱싱

df.loc[:,[True,False,False,False]]
att
202212380 65
202212370 95
202212363 65
202212488 55
202212312 80
202212377 75
202212463 65
202212471 60
202212400 95
202212469 90
202212318 55
202212432 95
202212443 95
202212367 50
202212458 50
202212396 95
202212482 50
202212452 65
202212387 70
202212354 90

여러개의 칼럼을 선택

- 방법1: dict스타일

df[['att','fin']]
att fin
202212380 65 40
202212370 95 80
202212363 65 30
202212488 55 80
202212312 80 100
202212377 75 15
202212463 65 90
202212471 60 0
202212400 95 10
202212469 90 20
202212318 55 25
202212432 95 0
202212443 95 35
202212367 50 30
202212458 50 85
202212396 95 95
202212482 50 10
202212452 65 45
202212387 70 35
202212354 90 90

- 방법2: ndarray 스타일 (정수리스트로 인덱싱, 슬라이싱, 스트라이딩)

df.iloc[:,[0,1]] #정수의 리스트를 전달하여 칼럼추출
att rep
202212380 65 55
202212370 95 100
202212363 65 90
202212488 55 80
202212312 80 30
202212377 75 40
202212463 65 45
202212471 60 60
202212400 95 65
202212469 90 80
202212318 55 75
202212432 95 95
202212443 95 55
202212367 50 80
202212458 50 55
202212396 95 30
202212482 50 50
202212452 65 55
202212387 70 70
202212354 90 90
df.iloc[:,0:2]  #슬라이싱, 0,1,2에서 마지막 2는 제외되고 0,1에 해당하는 것만 추출
att rep
202212380 65 55
202212370 95 100
202212363 65 90
202212488 55 80
202212312 80 30
202212377 75 40
202212463 65 45
202212471 60 60
202212400 95 65
202212469 90 80
202212318 55 75
202212432 95 95
202212443 95 55
202212367 50 80
202212458 50 55
202212396 95 30
202212482 50 50
202212452 65 55
202212387 70 70
202212354 90 90
df.iloc[:,2:]  #슬라이싱
mid fin
202212380 50 40
202212370 50 80
202212363 60 30
202212488 75 80
202212312 30 100
202212377 100 15
202212463 45 90
202212471 25 0
202212400 20 10
202212469 80 20
202212318 35 25
202212432 45 0
202212443 15 35
202212367 40 30
202212458 15 85
202212396 30 95
202212482 45 10
202212452 15 45
202212387 40 35
202212354 80 90
df.iloc[:,::2]  #슬라이싱, 스트라이딩
att mid
202212380 65 50
202212370 95 50
202212363 65 60
202212488 55 75
202212312 80 30
202212377 75 100
202212463 65 45
202212471 60 25
202212400 95 20
202212469 90 80
202212318 55 35
202212432 95 45
202212443 95 15
202212367 50 40
202212458 50 15
202212396 95 30
202212482 50 45
202212452 65 15
202212387 70 40
202212354 90 80

- 방법3: ndarray와 dict의 혼합형

df.loc[:,['att','mid']]
att mid
202212380 65 50
202212370 95 50
202212363 65 60
202212488 55 75
202212312 80 30
202212377 75 100
202212463 65 45
202212471 60 25
202212400 95 20
202212469 90 80
202212318 55 35
202212432 95 45
202212443 95 15
202212367 50 40
202212458 50 15
202212396 95 30
202212482 50 45
202212452 65 15
202212387 70 40
202212354 90 80
df.loc[:,'att':'rep']  # key로 하는 슬라이싱은 마지막 'rep'까지 표시되어 나온다
att rep
202212380 65 55
202212370 95 100
202212363 65 90
202212488 55 80
202212312 80 30
202212377 75 40
202212463 65 45
202212471 60 60
202212400 95 65
202212469 90 80
202212318 55 75
202212432 95 95
202212443 95 55
202212367 50 80
202212458 50 55
202212396 95 30
202212482 50 50
202212452 65 55
202212387 70 70
202212354 90 90
df.loc[:,:'rep']
att rep
202212380 65 55
202212370 95 100
202212363 65 90
202212488 55 80
202212312 80 30
202212377 75 40
202212463 65 45
202212471 60 60
202212400 95 65
202212469 90 80
202212318 55 75
202212432 95 95
202212443 95 55
202212367 50 80
202212458 50 55
202212396 95 30
202212482 50 50
202212452 65 55
202212387 70 70
202212354 90 90
df.loc[:,'rep':]
rep mid fin
202212380 55 50 40
202212370 100 50 80
202212363 90 60 30
202212488 80 75 80
202212312 30 30 100
202212377 40 100 15
202212463 45 45 90
202212471 60 25 0
202212400 65 20 10
202212469 80 80 20
202212318 75 35 25
202212432 95 45 0
202212443 55 15 35
202212367 80 40 30
202212458 55 15 85
202212396 30 30 95
202212482 50 45 10
202212452 55 15 45
202212387 70 40 35
202212354 90 80 90

- 방법4: bool을 이용한 인덱싱

df.iloc[:,[True,False,True,False]]
att mid
202212380 65 50
202212370 95 50
202212363 65 60
202212488 55 75
202212312 80 30
202212377 75 100
202212463 65 45
202212471 60 25
202212400 95 20
202212469 90 80
202212318 55 35
202212432 95 45
202212443 95 15
202212367 50 40
202212458 50 15
202212396 95 30
202212482 50 45
202212452 65 15
202212387 70 40
202212354 90 80
df.loc[:,[True,False,True,False]]
att mid
202212380 65 50
202212370 95 50
202212363 65 60
202212488 55 75
202212312 80 30
202212377 75 100
202212463 65 45
202212471 60 25
202212400 95 20
202212469 90 80
202212318 55 35
202212432 95 45
202212443 95 15
202212367 50 40
202212458 50 15
202212396 95 30
202212482 50 45
202212452 65 15
202212387 70 40
202212354 90 80
test_ndarray[:,range(2)]
array([[202212380,        65],
       [202212370,        95],
       [202212363,        65],
       [202212488,        55],
       [202212312,        80],
       [202212377,        75],
       [202212463,        65],
       [202212471,        60],
       [202212400,        95],
       [202212469,        90],
       [202212318,        55],
       [202212432,        95],
       [202212443,        95],
       [202212367,        50],
       [202212458,        50],
       [202212396,        95],
       [202212482,        50],
       [202212452,        65],
       [202212387,        70],
       [202212354,        90]])
df.iloc[:,range(2)]
att rep
202212380 65 55
202212370 95 100
202212363 65 90
202212488 55 80
202212312 80 30
202212377 75 40
202212463 65 45
202212471 60 60
202212400 95 65
202212469 90 80
202212318 55 75
202212432 95 95
202212443 95 55
202212367 50 80
202212458 50 55
202212396 95 30
202212482 50 50
202212452 65 55
202212387 70 70
202212354 90 90

첫번째 행을 선택

- 방법1

test_ndarray[0,:]
array([202212380,        65,        55,        50,        40])
test_ndarray[0]
array([202212380,        65,        55,        50,        40])
df.iloc[0]
att    65
rep    55
mid    50
fin    40
Name: 202212380, dtype: int64

- 방법2

df.iloc[[0]]  # 데이터프레임처럼
att rep mid fin
202212380 65 55 50 40

- 방법3

df.iloc[0,:]
att    65
rep    55
mid    50
fin    40
Name: 202212380, dtype: int64

- 방법4

df.iloc[[0],:]
att rep mid fin
202212380 65 55 50 40

- 방법5

df.loc['202212380']
att    65
rep    55
mid    50
fin    40
Name: 202212380, dtype: int64

- 방법6

df.loc[['202212380']]  # 데이터프레임처럼
att rep mid fin
202212380 65 55 50 40

- 방법7

df.loc['202212380',:]
att    65
rep    55
mid    50
fin    40
Name: 202212380, dtype: int64

- 방법8

df.loc[['202212380'],:]
att rep mid fin
202212380 65 55 50 40

- 방법9

len(df)
20
[True]+[False]*19
[True,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False]
_lst = [True]+[False]*19
df.iloc[_lst]
att rep mid fin
202212380 65 55 50 40
df.iloc[_lst,:]
att rep mid fin
202212380 65 55 50 40
df.loc[_lst]
att rep mid fin
202212380 65 55 50 40
df.loc[_lst,:]
att rep mid fin
202212380 65 55 50 40

여러개의 행을 선택

- 방법1

df.iloc[[0,2]]
att rep mid fin
202212380 65 55 50 40
202212363 65 90 60 30
df.iloc[[0,2],:]
att rep mid fin
202212380 65 55 50 40
202212363 65 90 60 30

- 방법2

df.loc[['202212380','202212363']]
att rep mid fin
202212380 65 55 50 40
202212363 65 90 60 30
df.loc[['202212380','202212363'],:]
att rep mid fin
202212380 65 55 50 40
202212363 65 90 60 30

- 그 밖의 방법들

df.iloc[::4]  # 스트라이딩
att rep mid fin
202212380 65 55 50 40
202212312 80 30 30 100
202212400 95 65 20 10
202212443 95 55 15 35
202212482 50 50 45 10
df.iloc[:5]
att rep mid fin
202212380 65 55 50 40
202212370 95 100 50 80
202212363 65 90 60 30
202212488 55 80 75 80
202212312 80 30 30 100
df.loc[:'202212312']
att rep mid fin
202212380 65 55 50 40
202212370 95 100 50 80
202212363 65 90 60 30
202212488 55 80 75 80
202212312 80 30 30 100
df.att < 80
202212380     True
202212370    False
202212363     True
202212488     True
202212312    False
202212377     True
202212463     True
202212471     True
202212400    False
202212469    False
202212318     True
202212432    False
202212443    False
202212367     True
202212458     True
202212396    False
202212482     True
202212452     True
202212387     True
202212354    False
Name: att, dtype: bool
df.loc[df.att<80]
att rep mid fin
202212380 65 55 50 40
202212363 65 90 60 30
202212488 55 80 75 80
202212377 75 40 100 15
202212463 65 45 45 90
202212471 60 60 25 0
202212318 55 75 35 25
202212367 50 80 40 30
202212458 50 55 15 85
202212482 50 50 45 10
202212452 65 55 15 45
202212387 70 70 40 35
df.loc[list(df.att<80),'rep':]  # 리스트로 바꿔주는게 컴퓨터에게 좀 더 명확한 전달
rep mid fin
202212380 55 50 40
202212363 90 60 30
202212488 80 75 80
202212377 40 100 15
202212463 45 45 90
202212471 60 25 0
202212318 75 35 25
202212367 80 40 30
202212458 55 15 85
202212482 50 45 10
202212452 55 15 45
202212387 70 40 35
df.loc[df.att<80,'rep':] # 하지만 리스트화 안해도 되긴 한다..
rep mid fin
202212380 55 50 40
202212363 90 60 30
202212488 80 75 80
202212377 40 100 15
202212463 45 45 90
202212471 60 25 0
202212318 75 35 25
202212367 80 40 30
202212458 55 15 85
202212482 50 45 10
202212452 55 15 45
202212387 70 40 35
df.iloc[list(df.att<80),1:]
rep mid fin
202212380 55 50 40
202212363 90 60 30
202212488 80 75 80
202212377 40 100 15
202212463 45 45 90
202212471 60 25 0
202212318 75 35 25
202212367 80 40 30
202212458 55 15 85
202212482 50 45 10
202212452 55 15 45
202212387 70 40 35

- 아래는 에러가 난다.

df.iloc[df.att<80,1:]
ValueError: Location based indexing can only have [integer, integer slice (START point is INCLUDED, END point is EXCLUDED), listlike of integers, boolean array] types

query (중요!!!)

- 예제1

df.query('att==90 and mid>30')
att rep mid fin
202212469 90 80 80 20
202212354 90 90 80 90

- 예제2

df.query('att<rep and mid<fin')
att rep mid fin
202212370 95 100 50 80
202212488 55 80 75 80
202212458 50 55 15 85

- 예제3

df.query('att<rep<80')
att rep mid fin
202212318 55 75 35 25
202212458 50 55 15 85

- 예제4

df.query('50<att<=90 and mid<fin')
att rep mid fin
202212488 55 80 75 80
202212312 80 30 30 100
202212463 65 45 45 90
202212452 65 55 15 45
202212354 90 90 80 90

- 예제5

df.query('(mid+fin)/2 >= 60')
att rep mid fin
202212370 95 100 50 80
202212488 55 80 75 80
202212312 80 30 30 100
202212463 65 45 45 90
202212396 95 30 30 95
202212354 90 90 80 90

- 예제6

_mean = df.att.mean()
_mean
73.0
df.query('att>=73')
att rep mid fin
202212370 95 100 50 80
202212312 80 30 30 100
202212377 75 40 100 15
202212400 95 65 20 10
202212469 90 80 80 20
202212432 95 95 45 0
202212443 95 55 15 35
202212396 95 30 30 95
202212354 90 90 80 90
df.query('att>=_mean')  # keyError 가 난다!
UndefinedVariableError: name '_mean' is not defined
df.query('att>=@_mean')   # 앞에 @ 골뱅이를 붙여주면 에러 안난다.
att rep mid fin
202212370 95 100 50 80
202212312 80 30 30 100
202212377 75 40 100 15
202212400 95 65 20 10
202212469 90 80 80 20
202212432 95 95 45 0
202212443 95 55 15 35
202212396 95 30 30 95
202212354 90 90 80 90

- 예제7

df.query('index <= "202212354"')
att rep mid fin
202212312 80 30 30 100
202212318 55 75 35 25
202212354 90 90 80 90
df.query('index <= "202212354" or index=="202212387"')  # 밖에를 큰따옴표 하고 안쪽을 작은따옴표 해도 된다.
att rep mid fin
202212312 80 30 30 100
202212318 55 75 35 25
202212387 70 70 40 35
202212354 90 90 80 90

사실 이 기능은 시계열자료에서 꽃핀다.

- 예제8

pd.date_range('20230101',periods=10)
DatetimeIndex(['2023-01-01', '2023-01-02', '2023-01-03', '2023-01-04',
               '2023-01-05', '2023-01-06', '2023-01-07', '2023-01-08',
               '2023-01-09', '2023-01-10'],
              dtype='datetime64[ns]', freq='D')
_df=pd.DataFrame(np.random.normal(size=(10,4)),columns=list('ABCD'), index=pd.date_range('20230101',periods=10))
_df
A B C D
2023-01-01 -0.259429 0.369731 -0.279944 0.099409
2023-01-02 -0.932515 -0.311629 0.828348 -0.225257
2023-01-03 -0.011607 0.927334 -0.753145 1.013249
2023-01-04 -1.050379 -0.323094 0.813898 1.035724
2023-01-05 -0.921175 0.513109 -0.905361 0.893707
2023-01-06 -1.521594 0.856883 -0.401441 -1.111551
2023-01-07 0.958028 -0.015302 0.891259 -0.826834
2023-01-08 1.822226 -1.258543 -0.705506 -0.519831
2023-01-09 -0.593394 -1.399224 -1.616172 -0.626952
2023-01-10 -0.083539 0.528519 0.051522 0.126757
_df.query("'2023-01-02' < index <= '2023-01-09'")
A B C D
2023-01-03 -0.011607 0.927334 -0.753145 1.013249
2023-01-04 -1.050379 -0.323094 0.813898 1.035724
2023-01-05 -0.921175 0.513109 -0.905361 0.893707
2023-01-06 -1.521594 0.856883 -0.401441 -1.111551
2023-01-07 0.958028 -0.015302 0.891259 -0.826834
2023-01-08 1.822226 -1.258543 -0.705506 -0.519831
2023-01-09 -0.593394 -1.399224 -1.616172 -0.626952
_df.query("'2023-01-02' < index <= '2023-01-09' and A+B<C")
A B C D
2023-01-04 -1.050379 -0.323094 0.813898 1.035724
2023-01-06 -1.521594 0.856883 -0.401441 -1.111551
2023-01-09 -0.593394 -1.399224 -1.616172 -0.626952

- query가 만능은 아니다.

df.columns = pd.Index(['att score', 'rep score', 'mid score','fin score'])
df.query("att score < 90")  # 변수이름에 띄어쓰기가 들어가면 에러가 난다.
SyntaxError: invalid syntax (<unknown>, line 1)
df.att socre
SyntaxError: invalid syntax (<ipython-input-285-4116dfe6888b>, line 1)
df.loc[df["att score"] < 90, :] # 이렇게 하면 됨
att score rep score mid score fin score
202212380 65 55 50 40
202212363 65 90 60 30
202212488 55 80 75 80
202212312 80 30 30 100
202212377 75 40 100 15
202212463 65 45 45 90
202212471 60 60 25 0
202212318 55 75 35 25
202212367 50 80 40 30
202212458 50 55 15 85
202212482 50 50 45 10
202212452 65 55 15 45
202212387 70 70 40 35

pandas 공부 3단계

전치

ndarray = np.arange(2*3).reshape(2,3)
df=pd.DataFrame(ndarray)
df
0 1 2
0 0 1 2
1 3 4 5
ndarray.T
array([[0, 3],
       [1, 4],
       [2, 5]])
df.T
0 1
0 0 3
1 1 4
2 2 5

ndarray.sum(axis=0)
array([3, 5, 7])
df.sum(axis=0)
0    3
1    5
2    7
dtype: int64
ndarray.sum(axis=1)
array([ 3, 12])
df.sum(axis=1)
0     3
1    12
dtype: int64

cumsum

df
0 1 2
0 0 1 2
1 3 4 5
ndarray.cumsum(axis=0) #누적해서 더해짐
array([[0, 1, 2],
       [3, 5, 7]])
df.cumsum(axis=0)
0 1 2
0 0 1 2
1 3 5 7
ndarray.cumsum(axis=1)
array([[ 0,  1,  3],
       [ 3,  7, 12]])
df.cumsum(axis=1)
0 1 2
0 0 1 3
1 3 7 12

형태변환

ndarray.tolist()
[[0, 1, 2], [3, 4, 5]]
df.to_numpy()
array([[0, 1, 2],
       [3, 4, 5]])
df.to_numpy().tolist()
[[0, 1, 2], [3, 4, 5]]
df.to_dict()
{0: {0: 0, 1: 3}, 1: {0: 1, 1: 4}, 2: {0: 2, 1: 5}}

pandas 공부 4단계 (생략)

숙제

- 아래의 DF에서 1,3번째 열을 추출하라.

df= pd.DataFrame({'att':[90,90,95],'rep':[80,90,90],'mid':[50,60,70], 'fin':[70,80,50]})
df
att rep mid fin
0 90 80 50 70
1 90 90 60 80
2 95 90 70 50
df[['att','mid']]
att mid
0 90 50
1 90 60
2 95 70
df.iloc[:,[0,2]]
att mid
0 90 50
1 90 60
2 95 70
df.iloc[:,::2]
att mid
0 90 50
1 90 60
2 95 70
df.loc[:,['att','mid']]
att mid
0 90 50
1 90 60
2 95 70